Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm
نویسندگان
چکیده
NLP tasks are often limited by scarcity of manually annotated data. In social media sentiment analysis and related tasks, researchers have therefore used binarized emoticons and specific hashtags as forms of distant supervision. Our paper shows that by extending the distant supervision to a more diverse set of noisy labels, the models can learn richer representations. Through emoji prediction on a dataset of 1246 million tweets containing one of 64 common emojis we obtain state-of-theart performance on 8 benchmark datasets within emotion, sentiment and sarcasm detection using a single pretrained model. Our analyses confirm that the diversity of our emotional labels yield a performance improvement over previous distant supervision approaches.
منابع مشابه
Emotion Analysis of Twitter Data That Use Emoticons and Emoji Ideograms
Twitter is an online social networking service on which users worldwide publish their opinions on a variety of topics, discuss current issues, complain, and express many kinds of emotions. Therefore, Twitter is a rich source of data for opinion mining, sentiment and emotion analysis. This paper focuses on this issue by analysing symbols called emotion tokens, including emotion symbols (e.g. emo...
متن کاملApproaches for Computational Sarcasm Detection: A Survey
Sentiment Analysis deals not only with the positive and negative sentiment detection in the text but it also considers the prevalence and challenges of sarcasm in sentiment-bearing text. Automatic Sarcasm detection deals with the detection of sarcasm in text. In the recent years, work in sarcasm detection gains popularity and has wide applicability in sentiment analysis. This paper complies the...
متن کاملemoji2vec: Learning Emoji Representations from their Description
Many current natural language processing applications for social media rely on representation learning and utilize pre-trained word embeddings. There currently exist several publicly-available, pre-trained sets of word embeddings, but they contain few or no emoji representations even as emoji usage in social media has increased. In this paper we release emoji2vec, pre-trained embeddings for all...
متن کاملJoint Emoji Classification and Embedding Learning
Under conversation scenarios, emoji is widely used to express humans’ feelings, which greatly enriches the representation of plain text. Plentiful utterances with emoji are produced by humans manually in social media platforms every day, which make emoji great influence on the human life. For the academic community, researchers are always with the help of utterances including emoji as annotated...
متن کاملCrystalNest at SemEval-2017 Task 4: Using Sarcasm Detection for Enhancing Sentiment Classification and Quantification
This paper describes a system developed for a shared sentiment analysis task and its subtasks organized by SemEval-2017. A key feature of our system is the embedded ability to detect sarcasm in order to enhance the performance of sentiment classification. We first constructed an affect-cognition-sociolinguistics sarcasm features model and trained a SVM-based classifier for detecting sarcastic e...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017